{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the second of the two notebooks where we aim to illustrate how one could use this library to build recommendation algorithms using the example in this [Kaggle notebook](https://www.kaggle.com/code/matanivanov/wide-deep-learning-for-recsys-with-pytorch) as guidance. In the previous notebook we used `pytorch-widedeep` to build a model that replicated almost exactly that in the notebook. In this, shorter notebook we will show how one could use the library to explore other models, following the same problem formulation, this is: given a state of a user at a certain point in time having watched a series of movies, our goal is to predict which movie the user will watch next. \n",
    "\n",
    "Assuming that one has read (and run) the previous notebook, the required data will be stored in a local dir called `prepared_data`, so let's read it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "import numpy as np\n",
    "import torch\n",
    "import pandas as pd\n",
    "from torch import nn\n",
    "\n",
    "from pytorch_widedeep import Trainer\n",
    "from pytorch_widedeep.utils import pad_sequences\n",
    "from pytorch_widedeep.models import TabMlp, WideDeep, Transformer\n",
    "from pytorch_widedeep.preprocessing import TabPreprocessor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "save_path = Path(\"prepared_data\")\n",
    "\n",
    "PAD_IDX = 0\n",
    "\n",
    "id_cols = [\"user_id\", \"movie_id\"]\n",
    "\n",
    "df_train = pd.read_pickle(save_path / \"df_train.pkl\")\n",
    "df_valid = pd.read_pickle(save_path / \"df_valid.pkl\")\n",
    "df_test = pd.read_pickle(save_path / \"df_test.pkl\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "...remember that in the previous notebook we explained that we are not  going to use a validation set here (in a real-world example, or simply a more realistic example, one should always use it).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_test = pd.concat([df_valid, df_test], ignore_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also remember that, in the previous notebook we discussed that the `'maxlen'` and `'max_movie_index'` parameters should be computed using only the train set. In particular, to properly do the tokenization, one would have to use ONLY train tokens and add a token for new 'unknown'/'unseen' movies in the test set. This can also be done with this library or manually, so I will leave it to the reader to implement that tokenzation appraoch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "maxlen = max(\n",
    "    df_train.prev_movies.apply(lambda x: len(x)).max(),\n",
    "    df_test.prev_movies.apply(lambda x: len(x)).max(),\n",
    ")\n",
    "\n",
    "max_movie_index = max(df_train.movie_id.max(), df_test.movie_id.max())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From now one things are pretty simple, moreover bearing in mind that in this example we are not going to use a wide component since, in pple, one would believe that the information in that component is also 'carried' by the movie sequences (However in the previous notebook, if one performs ablation studies, these suggest that most of the prediction power comes from the linear, wide model).\n",
    "\n",
    "In the example here we are going to explore one (of many) possibilities. We are simply going to encode the triplet `(user, item, rating)` and use it as a `deeptabular` component and the sequences of previously watched movies as the `deeptext` component. For the `deeptext` component we are going to use a basic encoder-only transformer model.\n",
    "\n",
    "Let's start with the tabular data preparation\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_train_user_item = df_train[[\"user_id\", \"movie_id\", \"rating\"]]\n",
    "train_movies_sequences = df_train.prev_movies.apply(\n",
    "    lambda x: [int(el) for el in x]\n",
    ").to_list()\n",
    "y_train = df_train.target.values.astype(int)\n",
    "\n",
    "df_test_user_item = df_train[[\"user_id\", \"movie_id\", \"rating\"]]\n",
    "test_movies_sequences = df_test.prev_movies.apply(\n",
    "    lambda x: [int(el) for el in x]\n",
    ").to_list()\n",
    "y_test = df_test.target.values.astype(int)\n",
    "\n",
    "tab_preprocessor = tab_preprocessor = TabPreprocessor(\n",
    "    cat_embed_cols=[\"user_id\", \"movie_id\", \"rating\"],\n",
    ")\n",
    "X_train_tab = tab_preprocessor.fit_transform(df_train_user_item)\n",
    "X_test_tab = tab_preprocessor.transform(df_test_user_item)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And not the text component, simply padding the sequences:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_text = np.array(\n",
    "    [\n",
    "        pad_sequences(\n",
    "            s,\n",
    "            maxlen=maxlen,\n",
    "            pad_first=False,\n",
    "            pad_idx=PAD_IDX,\n",
    "        )\n",
    "        for s in train_movies_sequences\n",
    "    ]\n",
    ")\n",
    "X_test_text = np.array(\n",
    "    [\n",
    "        pad_sequences(\n",
    "            s,\n",
    "            maxlen=maxlen,\n",
    "            pad_first=False,\n",
    "            pad_idx=0,\n",
    "        )\n",
    "        for s in test_movies_sequences\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now define the model components and the wide and deep model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "tab_mlp = TabMlp(\n",
    "    column_idx=tab_preprocessor.column_idx,\n",
    "    cat_embed_input=tab_preprocessor.cat_embed_input,\n",
    "    mlp_hidden_dims=[1024, 512, 256],\n",
    "    mlp_activation=\"relu\",\n",
    ")\n",
    "\n",
    "# plenty of options here, see the docs\n",
    "transformer = Transformer(\n",
    "    vocab_size=max_movie_index + 1,\n",
    "    embed_dim=32,\n",
    "    n_heads=2,\n",
    "    n_blocks=2,\n",
    "    seq_length=maxlen,\n",
    ")\n",
    "\n",
    "wide_deep_model = WideDeep(\n",
    "    deeptabular=tab_mlp, deeptext=transformer, pred_dim=max_movie_index + 1\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "WideDeep(\n",
       "  (deeptabular): Sequential(\n",
       "    (0): TabMlp(\n",
       "      (cat_embed): DiffSizeCatEmbeddings(\n",
       "        (embed_layers): ModuleDict(\n",
       "          (emb_layer_user_id): Embedding(749, 65, padding_idx=0)\n",
       "          (emb_layer_movie_id): Embedding(1612, 100, padding_idx=0)\n",
       "          (emb_layer_rating): Embedding(6, 4, padding_idx=0)\n",
       "        )\n",
       "        (embedding_dropout): Dropout(p=0.0, inplace=False)\n",
       "      )\n",
       "      (encoder): MLP(\n",
       "        (mlp): Sequential(\n",
       "          (dense_layer_0): Sequential(\n",
       "            (0): Linear(in_features=169, out_features=1024, bias=True)\n",
       "            (1): ReLU(inplace=True)\n",
       "            (2): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "          (dense_layer_1): Sequential(\n",
       "            (0): Linear(in_features=1024, out_features=512, bias=True)\n",
       "            (1): ReLU(inplace=True)\n",
       "            (2): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "          (dense_layer_2): Sequential(\n",
       "            (0): Linear(in_features=512, out_features=256, bias=True)\n",
       "            (1): ReLU(inplace=True)\n",
       "            (2): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "      )\n",
       "    )\n",
       "    (1): Linear(in_features=256, out_features=1683, bias=True)\n",
       "  )\n",
       "  (deeptext): Sequential(\n",
       "    (0): Transformer(\n",
       "      (embedding): Embedding(1683, 32, padding_idx=0)\n",
       "      (pos_encoder): PositionalEncoding(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "      )\n",
       "      (encoder): Sequential(\n",
       "        (transformer_block0): TransformerEncoder(\n",
       "          (attn): MultiHeadedAttention(\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "            (q_proj): Linear(in_features=32, out_features=32, bias=False)\n",
       "            (kv_proj): Linear(in_features=32, out_features=64, bias=False)\n",
       "            (out_proj): Linear(in_features=32, out_features=32, bias=False)\n",
       "          )\n",
       "          (ff): FeedForward(\n",
       "            (w_1): Linear(in_features=32, out_features=128, bias=True)\n",
       "            (w_2): Linear(in_features=128, out_features=32, bias=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "            (activation): GELU(approximate='none')\n",
       "          )\n",
       "          (attn_addnorm): AddNorm(\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "            (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "          )\n",
       "          (ff_addnorm): AddNorm(\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "            (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "          )\n",
       "        )\n",
       "        (transformer_block1): TransformerEncoder(\n",
       "          (attn): MultiHeadedAttention(\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "            (q_proj): Linear(in_features=32, out_features=32, bias=False)\n",
       "            (kv_proj): Linear(in_features=32, out_features=64, bias=False)\n",
       "            (out_proj): Linear(in_features=32, out_features=32, bias=False)\n",
       "          )\n",
       "          (ff): FeedForward(\n",
       "            (w_1): Linear(in_features=32, out_features=128, bias=True)\n",
       "            (w_2): Linear(in_features=128, out_features=32, bias=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "            (activation): GELU(approximate='none')\n",
       "          )\n",
       "          (attn_addnorm): AddNorm(\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "            (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "          )\n",
       "          (ff_addnorm): AddNorm(\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "            (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "          )\n",
       "        )\n",
       "      )\n",
       "    )\n",
       "    (1): Linear(in_features=23552, out_features=1683, bias=True)\n",
       "  )\n",
       ")"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wide_deep_model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And as in the previous notebook, let's train (you will need a GPU for this)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "epoch 1:   0%|                                                                                                         | 0/147 [00:34<?, ?it/s]\n"
     ]
    }
   ],
   "source": [
    "trainer = Trainer(\n",
    "    model=wide_deep_model,\n",
    "    objective=\"multiclass\",\n",
    "    custom_loss_function=nn.CrossEntropyLoss(ignore_index=PAD_IDX),\n",
    "    optimizers=torch.optim.Adam(wide_deep_model.parameters(), lr=1e-3),\n",
    ")\n",
    "\n",
    "trainer.fit(\n",
    "    X_train={\n",
    "        \"X_tab\": X_train_tab,\n",
    "        \"X_text\": X_train_text,\n",
    "        \"target\": y_train,\n",
    "    },\n",
    "    X_val={\n",
    "        \"X_tab\": X_test_tab,\n",
    "        \"X_text\": X_test_text,\n",
    "        \"target\": y_test,\n",
    "    },\n",
    "    n_epochs=10,\n",
    "    batch_size=521,\n",
    "    shuffle=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
